Accusoft.SmartZoneOCR4.Net
Select a Field Type
See Also Send Feedback
SmartZone OCR 4 for .Net - User Guide > How To > Select a Field Type

Glossary Item Box

Selecting and Checking Field Type

SmartZone OCR allows you to use pre-defined masking that defines the expected format of your data. If you know your data is expected to match one of these formats, specifying field type (using the Reader.FieldType property of SmartZoneOCR) will make your recognition results more accurate.

The FieldType parameter is used to assist in the recognition of text. SmartZone OCR will make greater attempts to match the result to the supported format of that field. It is very important that you select the correct field type for an image. If you do not know the format, then GeneralText is the appropriate field type to use.

If the field type determined by SmartZone OCR matches the field type you specified on input, the same value is returned in the FieldType property of the TextBlockResult. Otherwise, Unknown is returned as the FieldType value, which means the result did not confirm to the expected format of the FieldType specified on input. For example, if the Date FieldType is specified, but SmartZone OCR reads "01~23-47", then the FieldType "Unknown" will be returned.

Supported Field Types

You have the choice of field types including the following:

Language Description
Currency The supported currency symbols, currency punctuation, and digits:  $ ¢ £ ¥ € , . ` - = 0123456789. Supported formats include currency symbols in front of the digits, with comma and periods as separator characters and decimal separator. The € symbol may also be placed to the right of the rightmost digit.
Currency Plus The supported alphabetic abbreviations for currency symbols, currency punctuation, and digits:  USD GBP EUR E DKK Dkr KR NOK Nkr SEK Sk $ ¢ £ ¥ € , . ` - = 0123456789.
Data Validation Lists

A data validation list is a set of possible expected results. The advantage of using data validation lists as a field type is to improve recognition results by narrowing the possible answers returned by character recognition in the event of an ambiguity or conflict. An example of a data validation list is a list of two character US State abbreviations, from AL to WY. 

See Define and Edit Data Validation Lists for more information.

Date

MM-DD-YY  MM-DD-YYYY  MM/DD/YY  MM/DD/YYYY 
M-D-YY        MM/DD-YY       M/D-YYYY   MM/DD-YYYY
M/D-YY

DD-MM-YY  DD-MM-YYYY  DD/MM/YY  DD/MM/YYYY
D-M-YY        DD/MM-YY       D/M-YYYY   DD/MM-YYYY
D/M-YY

YY-MM-DD  YYYY-MM-DD  YY/MM/DD       YYYY/MM/DD
YY-M/D        YYYY-M/D        YYYY-MM/DD  YYYY-M-D

Email

The local-name and the domain name will be evaluated separately, using the @ as the delimiter. Each may use any of these ASCII characters:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
General Text All the supported characters in English, French, Spanish, Italian, German, Dutch, Portuguese, Norwegian, Finnish, Danish, and Swedish.
Regular Expression

A regular expression is a pattern in the form of a string that describes or matches the format of expected results, according to certain rules.  The advantage of using regular expressions as a field type is to improve recognition results by narrowing the possible answers returned by character recognition in the event of an ambiguity or conflict. A number of SmartZone's built-in field types already use regular expressions. For example, US Social Security Number is a regular expression of the form

\d{3}-?\d{2}-?\d{4}

See Regular Expressions for more examples and detailed syntax.

Social Security Number

999-99-9999
99-99999
999.99.9999
99.99999
999999999
9999999
999 99 9999
99 99999

Time

HH:MM:SS    HH.MM.SS    HH:MM:SS am/pm   HH.MM.SS am/pm

Where:

  • HH is between 00 and 24
  • MM is between 00 and 59
  • SS, if provided, is between 00 and 59
  • am pm, if provided, may be in lower or upper case, with and without a period after each letter.
United States Phone Number

digits 0-9, ( ), /, EXText

Where phone numbers are formatted with or without the 1 and with or without the area code.

1 (999) 999-9999   (999) 999-9999    999-9999  

1 (999) 999/9999  999-999-9999   999/999/9999  999-999/9999

Use ext, EXT, X, or x as the extension indicator, follow with two to four digits (the extension number) to the right of it.

URL

http://www.name
http://www.name.name
https://www.name
https://www.name.name

Supported extensions include:

  • com
  • edu
  • gov
  • net
  • any two letter extension.
For best recognition accuracy results, set the character set to the narrowest set possible that includes all possible returned values, then indicate the expected formats of recognition results by applying the field types listed here. Field types are used to improve recognition by defining the number of characters/digits and the formats of expected results, allowing it to choose more wisely from several possible returned values.

 

See Also

©2013. Accusoft Corporation. All Rights Reserved.